672 research outputs found
End-to-End Cross-Modality Retrieval with CCA Projections and Pairwise Ranking Loss
Cross-modality retrieval encompasses retrieval tasks where the fetched items
are of a different type than the search query, e.g., retrieving pictures
relevant to a given text query. The state-of-the-art approach to cross-modality
retrieval relies on learning a joint embedding space of the two modalities,
where items from either modality are retrieved using nearest-neighbor search.
In this work, we introduce a neural network layer based on Canonical
Correlation Analysis (CCA) that learns better embedding spaces by analytically
computing projections that maximize correlation. In contrast to previous
approaches, the CCA Layer (CCAL) allows us to combine existing objectives for
embedding space learning, such as pairwise ranking losses, with the optimal
projections of CCA. We show the effectiveness of our approach for
cross-modality retrieval on three different scenarios (text-to-image,
audio-sheet-music and zero-shot retrieval), surpassing both Deep CCA and a
multi-view network using freely learned projections optimized by a pairwise
ranking loss, especially when little training data is available (the code for
all three methods is released at: https://github.com/CPJKU/cca_layer).Comment: Preliminary version of a paper published in the International Journal
of Multimedia Information Retrieva
Reinforcement Learning Based Power Grid Day-Ahead Planning and AI-Assisted Control
The ongoing transition to renewable energy is increasing the share of
fluctuating power sources like wind and solar, raising power grid volatility
and making grid operation increasingly complex and costly. In our prior work,
we have introduced a congestion management approach consisting of a
redispatching optimizer combined with a machine learning-based topology
optimization agent. Compared to a typical redispatching-only agent, it was able
to keep a simulated grid in operation longer while at the same time reducing
operational cost. Our approach also ranked 1st in the L2RPN 2022 competition
initiated by RTE, Europe's largest grid operator. The aim of this paper is to
bring this promising technology closer to the real world of power grid
operation. We deploy RL-based agents in two settings resembling established
workflows, AI-assisted day-ahead planning and realtime control, in an attempt
to show the benefits and caveats of this new technology. We then analyse
congestion, redispatching and switching profiles, and elementary sensitivity
analysis providing a glimpse of operation robustness. While there is still a
long way to a real control room, we believe that this paper and the associated
prototypes help to narrow the gap and pave the way for a safe deployment of RL
agents in tomorrow's power grids
Learning Audio–Sheet Music Correspondences for Cross-Modal Retrieval and Piece Identification
This work addresses the problem of matching musical audio directly to sheet music, without any higher-level abstract representation. We propose a method that learns joint embedding spaces for short excerpts of audio and their respective counterparts in sheet music images, using multimodal convolutional neural networks. Given the learned representations, we show how to utilize them for two sheet-music-related tasks: (1) piece/score identification from audio queries and (2) retrieving relevant performances given a score as a search query. All retrieval models are trained and evaluated on a new, large scale multimodal audio–sheet music dataset which is made publicly available along with this article. The dataset comprises 479 precisely annotated solo piano pieces by 53 composers, for a total of 1,129 pages of music and about 15 hours of aligned audio, which was synthesized from these scores. Going beyond this synthetic training data, we carry out first retrieval experiments using scans of real sheet music of high complexity (e.g., nearly the complete solo piano works by Frederic Chopin) and commercial recordings by famous concert pianists. Our results suggest that the proposed method, in combination with the large-scale dataset, yields retrieval models that successfully generalize to data way beyond the synthetic training data used for model building
Align-RUDDER: Learning From Few Demonstrations by Reward Redistribution
Reinforcement Learning algorithms require a large number of samples to solve
complex tasks with sparse and delayed rewards. Complex tasks can often be
hierarchically decomposed into sub-tasks. A step in the Q-function can be
associated with solving a sub-task, where the expectation of the return
increases. RUDDER has been introduced to identify these steps and then
redistribute reward to them, thus immediately giving reward if sub-tasks are
solved. Since the problem of delayed rewards is mitigated, learning is
considerably sped up. However, for complex tasks, current exploration
strategies as deployed in RUDDER struggle with discovering episodes with high
rewards. Therefore, we assume that episodes with high rewards are given as
demonstrations and do not have to be discovered by exploration. Typically the
number of demonstrations is small and RUDDER's LSTM model as a deep learning
method does not learn well. Hence, we introduce Align-RUDDER, which is RUDDER
with two major modifications. First, Align-RUDDER assumes that episodes with
high rewards are given as demonstrations, replacing RUDDER's safe exploration
and lessons replay buffer. Second, we replace RUDDER's LSTM model by a profile
model that is obtained from multiple sequence alignment of demonstrations.
Profile models can be constructed from as few as two demonstrations as known
from bioinformatics. Align-RUDDER inherits the concept of reward
redistribution, which considerably reduces the delay of rewards, thus speeding
up learning. Align-RUDDER outperforms competitors on complex artificial tasks
with delayed reward and few demonstrations. On the MineCraft ObtainDiamond
task, Align-RUDDER is able to mine a diamond, though not frequently. Github:
https://github.com/ml-jku/align-rudder, YouTube: https://youtu.be/HO-_8ZUl-U
Feature-combination hybrid recommender systems for automated music playlist continuation
Music recommender systems have become a key technology to support the interaction of users with the increasingly larger music catalogs of on-line music streaming services, on-line music shops, and personal devices. An important task in music recommender systems is the automated continuation of music playlists, that enables the recommendation of music streams adapting to given (possibly short) listening sessions. Previous works have shown that applying collaborative filtering to collections of curated music playlists reveals underlying playlist-song co-occurrence patterns that are useful to predict playlist continuations. However, most music collections exhibit a pronounced long-tailed distribution. The majority of songs occur only in few playlists and, as a consequence, they are poorly represented by collaborative filtering. We introduce two feature-combination hybrid recommender systems that extend collaborative filtering by integrating the collaborative information encoded
in curated music playlists with any type of song feature vector representation. We
conduct off-line experiments to assess the performance of the proposed systems to recover withheld playlist continuations, and we compare them to competitive pure and hybrid collaborative filtering baselines. The results of the experiments indicate that the introduced feature-combination hybrid recommender systems can more accurately predict fitting playlist continuations as a result of their improved representation of songs occurring in few playlists(VLID)328909
- …